Active flash: towards energy-efficient, in-situ data analytics on extreme-scale machines
نویسندگان
چکیده
Modern scientific discovery is increasingly driven by large-scale supercomputing simulations, followed by data analysis tasks. These data analyses are either performed offline, on smaller-scale clusters, or in-situ, on the supercomputer itself. Both of these strategies are rife with storage and I/O bottlenecks, energy inefficiencies due to increased data movement, and increased time to solution. We propose Active Flash, a novel approach to in-situ, scientific data analysis, wherein data analysis is conducted on the solid-state device (SSD), where the data already resides. Our performance and energy models show that Active Flash has the potential to address many of the aforementioned concerns. In addition, we demonstrate an Active Flash prototype built on a commercial SSD controller, which further reaffirms the viability of our proposal.
منابع مشابه
Reducing Data Movement Costs Using Energy-Efficient, Active Computation on SSD
Modern scientific discovery often involves running complex application simulations on supercomputers, followed by a sequence of data analysis tasks on smaller clusters. This offline approach suffers from significant data movement costs such as redundant I/O, storage bandwidth bottleneck, and wasted CPU cycles, all of which contribute to increased energy consumption and delayed end-toend perform...
متن کاملFlexIO: Location-flexible Execution of In Situ Data Analytics for Large Scale Scientific Applications
Increasingly severe I/O bottlenecks on High-End Computing machines are prompting scientists to process simulation output data while simulations are running and before placing data on disk – ”in situ” and/or ”in-transit”. There are several options in placing in-situ data analytics along the I/O path: on compute nodes, on staging nodes dedicated to analytics, or after data is stored on persistent...
متن کاملAnalytical Cost Metrics : Days of Future Past
As we move towards the exascale era, the new architectures must be capable of running the massive computational problems efficiently. Scientists and researchers are continuously investing in tuning the performance of extreme-scale computational problems. These problems arise in almost all areas of computing, ranging from big data analytics, artificial intelligence, search, machine learning, vir...
متن کاملIn situ data analysis and I/O acceleration of FLASH astrophysics simulation on leadership-class system using GLEAN
The performance mismatch between computing and I/O components of current-generation HPC systems has made I/O a critical bottleneck for scientific applications. It is therefore critical to make data movement as efficient as possible and, inorder to facilitate simulation-time data analysis and visualization to reduce the data written to storage. These issues will be of paramount importance to ena...
متن کاملA New Method for Ranking Extreme Efficient DMUs Based on Changing the Reference Set with Using L2 - Norm
The purpose of this study is to utilize a new method for ranking extreme efficient decision making units (DMUs) based upon the omission of these efficient DMUs from reference set of inefficient and non-extreme efficient DMUs in data envelopment analysis (DEA) models with constant and variable returns to scale. In this method, an L2- norm is used and it is believed that it doesn't have any e...
متن کامل